Skip to content

Fix the benchmarks #1001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Sep 28, 2018
Merged

Fix the benchmarks #1001

merged 14 commits into from
Sep 28, 2018

Conversation

adamsitnik
Copy link
Member

  1. Both I and @davidwrighton run today into an issue with NuGet packages restoring. ML.NET defines the sources in Directory.Builds.props file, but as suggested by @agocke in Exclude Directory.Build.props/targets from generated csproj files BenchmarkDotNet#854 BDN ignores those files and needs the classic nuget.config file to work. (I know it's not perfect)
  2. In different config files for train and predict benchmarks #954 I missed the fact that there is no global config. This change sets the RecommendedConfig as default and when[TrainConfig] is used it overwrites the RecommendedConfig. Few types were missing config ([Config(typeof(SomeConfig))]), and they were using the default config from BenchmarkDotNet
  3. I mentioned how to download external dependencies in the README file

protected virtual Job GetJobDefinition()
=> Job.Default
.WithWarmupCount(1) // ML.NET benchmarks are typically CPU-heavy benchmarks, 1 warmup is usually enough
.WithMaxIterationCount(20);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamsitnik I think we should make the default configuration as the train config as most of the tests are running in the train configuration ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Anipik thank you for the suggestion! The TrainConfig is a very specific config, I would prefer to not make it a default one

nuget.config Outdated
<configuration>
<packageSources>
<!--To inherit the global NuGet package sources remove the <clear/> line below -->
<clear />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eerhardt will this affect anything else, or will they all keep using the MSBuild source?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to not have this list duplicated. NuGet.configs are not great because they cannot be modified (either adding new sources or removing sources, or changing locations) without actually modifying the file.

Instead, would it be possible to:

  1. pull the RestoreSources property out of Directory.Build.props and put it in a separate file, say build\NuGetSources.props. Directory.Build.props imports that file.
  2. Change the Benchmark toolchain to import the build\NuGetSources.props file.

That way we don't need to duplicate this list.


In reply to: 220023911 [](ancestors = 220023911)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eerhardt I have changed the way it works: now our project generator generates a simple .csproj file which includes the native dependencies and does not ignore Directory.Build.props file so BDN is using the nuget feeds defined in Directory.Build.props

@adamsitnik
Copy link
Member Author

I disabled the MemoryDiagnoser, now it can be enabled on demand via console args: -m or --memory. I did this because MemoryDiagnoser requires one extra iteration, so if benchmark takes 30s to run we needed additional 30s to get the memory diagnostics. So now the long running benchmarks will take less time (cc @justinormont)

@adamsitnik
Copy link
Member Author

@Anipik I have registered all the required assemblies so the benchmarks work again

@danmosemsft I added one integration test that runs once simple benchmark that has a dependency to native dll and checks if the benchmark executed correctly. I don't want the benchmarks to be broken again ;)

@adamsitnik adamsitnik changed the title Fix some of the benchmarks Fix the benchmarks Sep 27, 2018
Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @adamsitnik. Thanks for enhancing these benchmarks. I just had a few random questions/thoughts.

@@ -19,4 +23,39 @@ internal class EmptyWriter : TextWriter
internal static readonly EmptyWriter Instance = new EmptyWriter();
public override Encoding Encoding => null;
}

internal static class EnvironmentFactory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) can this go in its own file?

{
var environment = new ConsoleEnvironment(verbose: false, sensitivity: MessageSensitivity.None, outWriter: EmptyWriter.Instance);

environment.ComponentCatalog.RegisterAssembly(typeof(TLoader).Assembly);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just have a single extension method on IHostEnvironment that just registered all types/assemblies. Then callers wouldn't need to pass in the TTypes.

For example, like this

public static TEnvironment AddStandardComponents<TEnvironment>(this TEnvironment env)
where TEnvironment : IHostEnvironment
{
env.ComponentCatalog.RegisterAssembly(typeof(TextLoader).Assembly); // ML.Data
env.ComponentCatalog.RegisterAssembly(typeof(LinearPredictor).Assembly); // ML.StandardLearners
env.ComponentCatalog.RegisterAssembly(typeof(CategoricalTransform).Assembly); // ML.Transforms
env.ComponentCatalog.RegisterAssembly(typeof(FastTreeBinaryPredictor).Assembly); // ML.FastTree
env.ComponentCatalog.RegisterAssembly(typeof(EnsemblePredictor).Assembly); // ML.Ensemble
env.ComponentCatalog.RegisterAssembly(typeof(KMeansPredictor).Assembly); // ML.KMeansClustering
env.ComponentCatalog.RegisterAssembly(typeof(PcaPredictor).Assembly); // ML.PCA
env.ComponentCatalog.RegisterAssembly(typeof(Experiment).Assembly); // ML.Legacy
return env;
}

and you wouldn't need two Create methods. Just one that creates the environment and registers all the assemblies.


private ITestOutputHelper Output { get; }

[Fact(Skip = SkipTheDebug)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully understand the goal of this test. It is just testing that copying native dependencies in a BDN project works correctly? It won't catch breaks in our actual benchmark tests, right?

I'm sort of wondering if this test is valuable going forward or not.... What kind of changes is it guarding against?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of this test is to make sure that we can build and run the benchmarks. So far they got broken few times, an example: nuget.config file was removed, BDN was ignoring the Directory.Builds.props file which contained the nuget feeds list and it was failing to restore one of the dependencies. Few people pinged me with the exact same problem.

With this test I am confident that there will be no breaking changes for the benchmarks.

I would love to run the benchmarks as part of the test, however, they need a lot of time to execute. Some even 300s for single benchmark invocation

<ItemGroup>
<ProjectReference Include="..\Microsoft.ML.Benchmarks\Microsoft.ML.Benchmarks.csproj" />

<NativeAssemblyReference Include="FastTreeNative" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using FastTree in the test?

{
Add(DefaultConfig.Instance); // this config contains all of the basic settings (exporters, columns etc)

Add(GetJobDefinition() // jod defines how many times given benchmark should be executed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jod type-o

</ItemGroup>
<ItemGroup>
<ProjectReference Include=""{GetProjectFilePath(buildPartition.RepresentativeBenchmarkCase.Descriptor.Type, logger).FullName}"" />
<NativeAssemblyReference Include=""CpuMathNative"" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to include all of the <NativeAssemblyReference from the calling BDN project instead of hard-coding them here? That way they only need to be defined in ML.Benchmarks.csproj.

@adamsitnik adamsitnik merged commit 17ee205 into dotnet:master Sep 28, 2018
{
public static string DatasetNotFound = "Could not find {0} Please ensure you have run 'build.cmd -- /t:DownloadExternalTestFiles /p:IncludeBenchmarkData=true' from the root";
}

// Adding this class to not print anything to the console.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Anipik or @adamsitnik - FYI - whoever is in the benchmark tests again. A new class that just came in is the LocalEnvironment, which doesn't print to the console by default. We could remove this class and switch our environment from ConsoleEnvironment to LocalEnvironment. That will work around the issue as well, and is simpler.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants